feat: Add OpenAI embedding and query expansion support#116
feat: Add OpenAI embedding and query expansion support#116
Conversation
|
Great. I was looking for this. But the rerank in there doesn't support api calls? |
|
This is great! :) Good job man 🔥 |
Port PR tobi#116 (tobi/qmd) to current main, adapting to the refactored codebase. Adds OpenAI as an alternative to local GGUF models, fixing the ARM64 segfault during hybrid search (issue tobi#68). Changes: - New src/openai-llm.ts: OpenAI API client (embed, embedBatch, rerank, expandQuery) with exponential backoff and rate limiting - llm.ts: setEmbeddingConfig(), getDefaultEmbeddingLLM(), isUsingOpenAI() - collections.ts: EmbeddingProviderConfig type, getEmbeddingConfig() - store.ts: Provider-aware embedding, chunking (tiktoken), expand, rerank - qmd.ts: Startup config loading, provider-aware embed command - package.json: openai + tiktoken dependencies Config via ~/.config/qmd/index.yml: embedding: provider: openai openai: model: text-embedding-3-small Or env: QMD_OPENAI=1 + OPENAI_API_KEY
|
Love this! |
4869849 to
a4f8415
Compare
|
Can one change |
Adds support for using OpenAI's text-embedding-3-small model as an
alternative to local llama-cpp embeddings.
Changes:
- New openai-llm.ts: OpenAI API client implementing LLM interface
- llm.ts: Embedding config management, getDefaultEmbeddingLLM()
- collections.ts: EmbeddingProviderConfig for YAML config schema
- store.ts: Use configurable embedding LLM, skip local model for
query expansion/rerank when using OpenAI
- qmd.ts: Load embedding config on startup
- package.json: Add openai dependency
- README.md: Documentation for OpenAI embeddings
Configuration (in ~/.config/qmd/index.yml):
embedding:
provider: openai
openai:
api_key: sk-... # Optional, falls back to OPENAI_API_KEY env
model: text-embedding-3-small # Optional, this is the default
Benefits:
- Much faster embedding (~10x vs local models on CPU)
- No GPU/VRAM requirements
- More reliable (no local model loading issues)
- Cost: ~$0.02 per 1M tokens
- OpenAI embeddings (text-embedding-3-small, 1536d) via QMD_OPENAI=1 - Query expansion with gpt-4o-mini (~200ms vs 30s local) - Tiktoken for fast tokenization (no model loading) - Exponential backoff with jitter for rate limits (429) - Inter-batch delay (150ms) to avoid hitting RPM limits - Performance: search 3-5s (was 30-60s), embed ~10min (was 2hrs) Files: openai-llm.ts, llm.ts, store.ts, qmd.ts Deps: openai, tiktoken
Replace the rerank() stub with a real listwise reranker using gpt-4o-mini. - Sends top candidates with query to gpt-4o-mini as a ranking task - Parses comma-separated index output, handles missing/duplicate indices - Skips API call for ≤2 documents (not worth the latency) - Falls back to original order on API failure - Cost: ~$0.001 per rerank call - Updated qmd.ts to route through OpenAI reranker instead of skipping The full qmd query pipeline with OpenAI now: 1. Query expansion (gpt-4o-mini) 2. BM25 + vector search (parallel) 3. RRF fusion 4. Cross-encoder reranking (gpt-4o-mini) ← NEW 5. Position-aware blending
Accept comma-separated collection names in -c flag for cross-collection search. All three search modes (search, vsearch, query) now support querying multiple collections simultaneously. Changes: - resolveCollectionFilter() helper parses and validates comma-separated names - searchFTS() accepts string | string[] for collection filtering - searchVec() accepts string | string[] for collection filtering - SQL uses IN clause for multi-collection filtering - Updated interface types and test for new parameter types Usage: qmd search 'auth' -c repo-a,repo-b qmd vsearch 'auth patterns' -c docs,examples qmd query 'OAuth implementation' -c project,patterns,docs This enables Shad's multi-vault search to pass all vault collections in a single qmd call instead of running separate searches per collection.
7a718f6 to
fc2f137
Compare
|
Thanks for the patience on this. I've refreshed it: Update (2026-03-28) I also got feedback from @alexleach running OpenAI-compatible remote endpoints in minimal Docker environments. adds configurable OPENAI_BASE_URL, and Waiting for him to re-submit PRs Cheers |
|
PR rebased to main |
Summary
Optional OpenAI integration for embeddings and query expansion. Dramatically faster for users who prefer API-based inference over local models.
Performance
Features
• OpenAI Embeddings — text-embedding-3-small (1536 dims), native batch API, ~$0.02/1M tokens
• OpenAI Query Expansion — gpt-4o-mini for lex/vec/hyde variants
• OpenAI Reranking — API-based reranking replaces local qwen3-reranker, eliminating model download and GGUF inference overhead
• Tiktoken chunking — eliminates model load time for tokenization
• Robust retry logic — exponential backoff with jitter for rate limits
Usage
export OPENAI_API_KEY="sk-..." export QMD_OPENAI=1 qmd embed -f # Re-embed with OpenAI qmd search "query"Design
• Opt-in — local models remain the default
• Graceful fallback — errors don't crash, just skip
• Replace local reranking with OpenAI — no GGUF model download or local inference needed
• No breaking changes — existing workflows unchanged
Files Changed
• src/openai-llm.ts — new OpenAI LLM implementation
• src/llm.ts — embedding config, provider switching
• src/store.ts — tiktoken chunking integration
• src/qmd.ts — QMD_OPENAI env var support
Dependencies
• openai — API client
• tiktoken — fast BPE tokenization